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Abstract 

Using a spatial interaction modeling approach, we investigate the German col- 
legiate social network site StudiVZ. We focus on identifying factors that foster 
strong inter-institutional linkages, testing whether the acquaintanceship rate 
between institutions of higher education is related to various geographic and 
institutional attributes. We find that acquaintanceship is most significantly 
related to geographic separation: measuring distance with automobile travel 
time, acquaintanceship drops by 91% for each additional 100 minutes. Institu- 
tion type and the former East- West German divide are also related to this rate 
with statistical significance. 

Keywords: online social networks, spatial interaction modeling, generalized 
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1. Introduction 

Recent research of social network data indicates that the strength of flow be- 
tween cities can be approximated by a gravity law. That is, the flow between a 
pair of cities — which may represent people, communication, goods, or money — is 
proportional to the product of their sizes div ided by the square of their distance 
( Lambiotte et al. . 20081: Krings et al. . l2009h . However, hows between cities de- 



pend upon more than their sizes and the distance that separates them; they 
also depend upon cultural fact ors such as language, region, or economic simi- 



larity. For example, the work of iBlondel et al.l (|2008l ) suggests that the Belgian 



mobile communication network splits rather cleanly into two large-scale com- 
munities: Fre nch and Dutch spe akers. Using the network of flows of currency 
between cities. iBrockmann (l2010h found that flows do not continuously decrease 



as a function of distance, but rather are affected by sharp cultural boundaries 
embedded in geographic space. 
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Given a social network that contains data on both geographic location and 
various other node attributes such as language, it is natural to hypothesize 
that some node attribute is related to geographic flow. In this paper, we ar- 
gue that spatial interaction models are useful for statistically testing such hy- 
potheses. Spatial interaction modeling focuses on estimating the variation of 
flows bet ween locations in geog raphic space, such as regions or cities (see, for 
example, ISen and Smithl . 119951 ) While spatial interaction modeling is widely 
used in the fields of economics and economic geography to understand and 
predict interaction behavio r in a system of spatial units (for an overview, see 
Fotheringham and O'Kellvl (Il989l)), i t has only rarely been applied to social 



network data ( Scherngell and Barber , l2009h . 

We use spatial interaction models to investigate data from the social net- 
work site (SNS) StudiVZ, a collegiate online social network popular in German- 
speaking countries. In particular, we consider acquaintanceships between stu- 
dents at the University of Bielefeld (UniBi) and students at other institutions. 
We determine how various geographic and institutional attributes — including 
institution type and region — affect the number of acquaintanceships between 
these institutions. 

We use these attributes to define several measures of separation between 
institutions, with their importance assessed by comparing to the data. We con- 
sider two similar models, the first with fewer and rougher separation measures, 
and the second with more and finer separation measures. For both models, we 
find that the most pronounced separation measure is geographic distance, with 
institution type (e.g., university, technical college, art school) also playing an 
important and significant role. Regional factors play a less pronounced, yet 
statistically significant, role. 

In Section |2j, we give an overview of StudiVZ and describe how we collected 
the data on its linkage structure. We begin Section by briefly introducing 
spatial interaction modeling in general; we then explain the details of the two 
particular models that we propose. Estimation results for the models are pre- 
sented in Section We summarize and discuss our results in Section 0. 



2. StudiVZ 



2.1. Overview 

StudiVZ is a social network site for Ge rman post-se condary students that 
was created in October 2005. As reported by lBakstl l 20061 ). StudiVZ was inspired 
by and intentionally imitated the leading U.S. social networking service for 
students, Facebook, and the site therefore shares many features with Facebook. 
Within months of its creation, StudiVZ became the dominant social networking 
service for students in Germany, Austria, and regions of Switzer l and. 

Social network sites have been defined by iBovd and Ellison! (|2007l) as web 
services that allow individuals to (1) construct a public or semi-public profile, 
(2) articulate a list of other users with whom they are connected, and (3) view 
and traverse connections made by others. In addition to these basic capabili- 
ties, important functions of StudiVZ and most other SNSs include semi-public 
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photo-sharing, "walls" (i.e., message boards), and status updates — all linked to 
individuals' profile pages. Users of SNSs are often highly engaged; for example, 
at the time we collected data, StudiVZ users aver aged about on e visit per user 
per day, and about 30 page impressions per visit ( Hensenl . 120081) . 

While one might expect much of the interaction between users of these sites 
to take place between people who have met o nly on the interne t (as is often 
the case for message boards and chat rooms) , iBovd and Ellison! note that the 



relationships articulated in SNSs te nd to have an offline orig in. This claim has 
been supported by the findings of iMaver and Puller] (|2008l) . who report that 
only 0.4% of the Facebook friendships they studied appear have originated as 
"merely online friendships" as well as other em pirical studies (jHavthornthwaite . 
20051 lLampe et all 120061: Ellison et all l2007t ). This offline basis suggests that 
conclusions based on SNS data are more likely to be generalizable to everyday, 
personal social interaction than data which comes from other online services such 
as chat rooms or message boards, where social interaction is often anonymous. 

Despite the correspondence between online and offline social contact, it is 
difficult to interpret the meaning of SNS relationships, labeled in StudiVZ as 
"friendships." In the StudiVZ data that we analyze in this paper, over half 
of users listed more than 48 "friends," and over a quarter of users had more 
than 86 friends. In a recent dataset from the more mature Facebook, where 
users have had m ore time to accumu late friends, over half of users listed more 
than 100 friends ( Lewis et al. . 2008). Such large degrees indicate that many 



of these so-called friendships are not close, active friendships, but rather latent 
friendships or acquaintanceships, such as old high-school friends or colleagues 
from previous jobs. Nevertheless, for the rest of the paper, all of these ties will 
be referred to as friendships. 

Within the StudiVZ system, users associate themselves with the institutions 
they atten d. As we collec ted data in January 2008, StudiVZ reported 4.5 million 
members (Hensen, 20081) . If one considers that a total of 2.47 million students 



were enrolled at higher-learning institutions in Germany, Austria, and Switzer- 
land in the academic year 2007/20080 it appears that not only had StudiVZ 
largely saturated its target audience of students, but also that the site had at- 
tracted users from other audiences, perhaps former students, exchange students, 
and young people who were not students. These users presumably associated 
themselves with institutions that they did not attend. 



2.2. Data Collection 

We retrieved data in late January, 2008. The data collected that is relevant 
to this paper is the friendship data, which at the time of collection was publicly 
visible for all profiles. As mentioned above, a "friendship" in StudiVZ is the 
primary indicator of relation, and is formed when one user requests another 



2 In the academic year 2007/2008 , Germany reported 1.97 m illion post-secondary students 
llHochsc hulrcktorenkonfcrenz . 200^), Austria report ed 272,003 {G umpoldsbcrgc r and Nitscbt 
12009ft . and Switzerland reported 225,862 thousand iBundesamt fur Statistild J2009T) . 
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user to confirm him as a friend. The information concerning the direction of 
the request (i.e, who requested whom) is not public, and therefore we treat all 
friendships as undirected. 

We collected the friendship data using snowball sampling, such that we were 
left with all friendship data of UniBi users in the giant connected component 
of StudiVZ's UniBi subgraph. This data set included 29,192 users. The users 
we identified at UniBi listed over 1.4 million friendships. Around 350,000 of 
these friendships are among the 29,192 UniBi users; we refer to these as in- 
ternal friendships. Approximately 1.05 million friendships were between UniBi 
students and some 367,000 students associated with other higher-learning in- 
stitutions; we call these external friendships. Because at the time of collection 
StudiVZ enjoyed such widespread popularity among German college students, 
we contend that a considerable proportion of all acquaintanceships between the 
students of the University of Bielefeld and students at other institutions is in- 
cluded in the external friendships. 

Here we examine only the external friendships aggregated at the level of 
institutions of higher education (henceforth labeled as simply "institutions"). 
More specifically, a weight was determined for the connection between each of 
304 German institutions and UniBi. The weight between UniBi and some other 
institution corresponds to the number of friendships between all StudiVZ users 
in UniBi and all StudiVZ users at that institution. 

We exclude from consideration the four nearest institutions, each with a 
travel time of under thirty minutes to UniBi, because these institutions have 
special relationships with UniBi (e.g., shared dormitories, libraries, and pre- 
sumably other, unknown arrangements). These relationships are not accounted 
for in our model, leading it to drastically under-predict the rate of friendship 
with these four institutions. Other institutions that are nearby, but outside this 
thirty-minute limit, do not show a marked mismatch with the model predictions. 
Additionally, we consider only institutions in Germany, for which information 
on addresses, institution type, and enrollment size is consistently available from 
the website of the German Rectors' Conference!! 



3. Spatial Interaction Model Definition 



To investigate the StudiVZ network, we adopt a spatial interaction modeling 
perspective. Spatial interaction modeling focuses on estimating the variation 
of flows bet ween locations in geog raphic space, such as regions or cities (see, 
for example. ISen and Smith! . I1995I ). Said flows, known as spatial interactions, 
can be of many forms, including people, goods, money, or knowledge. Spatial 
interaction modeling is widely used in the fields of economics and economic 
geography to understand an d predict interaction behavior in a system of spatial 
units; for an overview, see iFotheringham and O'Kellv ( 19891) . Here, we will 
investigate flows of social interactions, in the form of online friendships. 



3 http : //www . hochschulkompass . de/hochschulen/download . html 
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The specific technique we will use is that of generalized linear models. The 
usual approach is to assume that the spatial interaction i*y between an origin 
location i and a destination location j has a product form given by 

Fij = AtBjSij , (1) 

where Ai is the origin function, Bj is the destination function, and Sij is the 
separation function. As StudiVZ friendships are inherently symmetric, we can 
without loss of generality assume that UniBi is always the destination, reducing 
the model to a one-dimensional version given by 

F t =cA t S z . (2) 

In Eq. (Q), Fi and Si are the source-only analogs of F^ and Sij, respectively. 
The destination function Bj has been replaced by a constant term c, reflecting 
the contribution from UniBi. As Ai and Si both have the same dependence on 
i, the two terms could be merged into one, but we will treat them separately 
to maintain the distinction between properties of just the origin institution and 
properties dependent upon the separation between the origin and destination 
(i.e., UniBi) institutions. 

We take the origin function Ai to be 

Ai = af , (3) 

where <Zj is the number of students enrolled at institution i during the 2007/2008 
academic year, accounting for differences in the sizes of the institutions; a is a 
parameter to be determined. We take the separation function Si to include N4 
measures of separation, with a general form of 

^cxpjxy^) • (4) 

The parameters fi^ weigh various separation measures against one another. 
For notational convenience, we similarly rewrite the destination constant as an 
exponential, c = exp7. 

With the origin and separation functions chosen as in Eqs. (0) and Q), the 
spatial interaction model in Eq. ([2]) is a generalized linear model. A common 
approach for determin i ng th e model parameters is standard OLS estimation 



( Bergkvist and Westinl . 119970 . However, as the fi are non- negative integers, 
OLS estimation is inappropriate, being equivalent to assuming the residuals 
Fi — cAiSi for Eq. (Q) are normally distributed. In the present discrete 
data generating process would then be approximated by a misrepresentative 
continuous process. Instead, we assume a negative binomial model specification 

P(f) = T ^ + S ^ ( 5 - ( cAA V' (5) 

{Jl) Tifi + yriS-^ycAiSi + S-i) {cASi + S-iJ ' y0) 
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where T (•) denotes the Gamma function and S is a dispersion parameter; in the 
limit as 5 — > 0, Eq. ^) reduces to a Poisson model specification. We estimate 
model parameters with maximum likelihood procedures using Newton- Raphson 
0. For more detail on th e derivation and properties of negative binomial models 
like those in Eq. (0), see Cameron and Trivedi ( 1998 ). 

The separation function constitutes the core of a spatial interaction model, 
and is of central importance in the context of the research questi ons of the cur- 
rent s tudy. We use a multivariate exponential separation function (|Sen and Smith! . 
1995). as defined in Eq. Q), that provides a flexible framework for represent- 



ing different kinds of separation. In the current paper, we focus on Nd = 10 
separation measures. 

We distinguish between geographic separation effects, separation effects re- 
lated to the federal states of Germany (which will henceforth be referred to 
simply as states), and institutional separation effects. Geographic separation 

effects are captured by , the logarithm of the automobile travel tim^f] in 

(2) 

minutes between UniBi and the other institutions. We augment this with dh , 
a dummy variable that measures separation effects related to the states of Ger- 
many. It is defined as a binary variable set to one if institution i is located in 
North Rhine- Westphalia, the same state as UniBi, and zero otherwise. 

Further, we take into account separation effects related to the type of the 

(3) 

institutions. The dummy variable d\ accounts for the type of institution, with 
its value set to one if one institution i is the same type of institution as UniBi, 
and zero otherwise. Here we use information on the organization of the German 
post-secondary education system, which contains several types of institutions. 
Our dataset includes general universitie^]; Fachhochschulen (officially translated 
as "universities of applied sciences" , and roughly described as technical colleges 
or trade schools — these will be referred to as "applied universities"); religious 
institutions; art schools, including music schools; teacher colleges; and private 
institutions. 

With this information we further refine into separate dummy variables 
for each institution type, more precisely reflecting how each type of institution 
affects interaction probability. Thus, we set a value of one for d\ if institution i 



4 For parameter estimation, we use the open source statistical software R (version R-2.12.0). 

5 We selected the travel time as corresponding to the geographic separation as experienced 
for personal interactions. Other measures of geographic separation include the great circle 
distance or driving distance between the institutions; these are strongly correlated with the 
travel time. To calculate the travel time between the University of Bielefeld and another 
institution, we queried Google Maps (http://maps.google.de/) for the drive time between 
the University of Bielefeld's address and the other institution's address using the default 
routing settings. 

B We define "general universities" to be public institutions that have the right to grant 
doctoral degrees and which offer a full range of faculties. For example, we categorize tech- 
nical universities as general universities, but not dedicated medical schools, applied technical 
colleges, art schools, or those institutions which are private, religious, or run by the military. 
UniBi itself is a general university and is located in former West Germany. 
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is an applied university, for d\ if institution i is a religious, for d\ if institution 
i is an art school, for Sp if institution i is a teacher college, for if institution 
i is a private institution, and for c?| 9 ' if institution z is a general university. 

Due to the enormous political and institutional changes during the reunifica- 
tion of the Federal Republic of Germany and the German Democratic Republic 
in 1990, we considered a variable that reflects predominant barriers between 
these major blocks. Thus, we introduce d\ , a dummy variable that is set to 
one if institution i is located in the former German Democratic Republic, and 
zero otherwise. 



4. Estimation Results 

In this section, we discuss the estimation results of the negative binomial 
spatial interaction model, as defined by Eq. (0). The dependent variable is 
the observed number of friendships between institution i and the University 
of Bielefeld. The independent variables are origin and separation measures as 
defined in Section 0. We consider a basic model version including separation 
measures for the logarithmic geographic distance d^\ the same state , and 

(3) (3) 

the same category d\ . In an extended model version, we replace d\ with spe- 
cific dummy variables for each category (d\ k ' for k = 4, 5, . . . , 9). The extended 

model serves to check robustness of the other model parameters d^ and d? , 
while also providing additional information on specific effects related to different 
institution types. 

In Table m, we present the sample estimates of the spatial interac tion model 
parameters, along with tests of statistical significance ( Greenei . 20031 ). The sec- 



ond and third columns contain modeling results for the basic and extended 
models, respectively. There are 304 observations. The estimate of the disper- 
sion parameter 5 is 0.51 in the basic model version and 0.45 in the extended 
model version; the estimates are statistically significant. Thus, we conclude that 
unob served heterogeneity, not cap tured by the covariates, leads to overdisper- 



sion (jCameron and Trivedi 1998) and that the negative binomial specification 
of Eq. (0) is a more appropriate choice than would be, e.g., a simpler Poisson 
specification. 

The model diagnostics underpin the statistical significance of the models. 
The likelihood ratio test is significant for both model versions. To compare the 
two models, we draw on the Bayesian Information Criterion (BIC), which is 
widely used f or comparing model fit of non-nested models (see, for instance, 



Rafteryl 119951 ) . It can be seen that the extended model version fits better to the 
data. This is also reflected by McFadden's R 2 and Nagelkerke's R 2 in terms of 
the explained varian ce by the covariates (for a formal description of the model 
diagnostics used, see Long and FreeseL 2001 ) . 



Turning our attention first to the results of the basic model in the second col- 
umn, we note that increasing travel time between the University of Bielefeld and 
other institutions i has a significant negative effect on the number of friendships. 
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Model Version 




Basic 


Extended 


Destination constant 7 


9.37(0.49)*** 


9.59(0.55)*** 


Origin exponent a 


0.76(0.03)*** 


0.71(0.05)*** 


Log geographic distance (3^ 


-1.45(0.09)*** 


-1.43(0.08)*** 


Same state f3^ 


-0.32(0.16)* 


-0.32(0.15)* 


Same category f3^ 


-0.58(0.12)*** 




Applied University 




-0.56(0.16)*** 


Religious Institution/^ 5 ) 




-0.98(0.24)*** 


Art School /3^ 




-0.73(0.21)*** 


Teacher College 




-0.26(0.31) 


Private (3^ 




-0.28(0.21) 


General University (3^ 




0.20(0.17) 


Former GDR /3< 10 ) 




-0.27(0.11)* 


Dispersion parameter S 


0.51(0.04)*** 


0.45(0.04)*** 


McFaddcn's Adjusted R' 2 


0.115 


0.139 


Cragg-Uhler (Nagelkerke) R 2 


0.846 


0.900 


Log-likelihood 


-2106.52 


-2085.83 


Likelihood ratio \ 2 


659.99 


701.37 


Akaikc Information Criterion 


14.998 


13.802 


Bayesian Information Criterion 


2609.460 


2502.287 



*p < 0.05 **p < 0.01 ***p < 0.001 



Tabic 1: Parameter estimates for the spatial interaction models. Fit parameters for the spatial 
interaction models arc based on 304 institutions. Parenthetical values show the standard er- 
rors. Asterisks indicate the statistical significance of the parameter fits. Performance statistics 
indicate that the extended model better explains the data. 
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This result indicates that StudiVZ is not a virtual world in which geographic 
location is unimportant. The parameter estimate of f}^ 1 ' — —1.45 in the ba- 
sic model indicates that for each additional 100 minutes of travel time to the 
University of Bielefeld, the mean friendship probability decreases by 91%. The 
parameter /?( 2 ) is estimated to be -0.3 and is statistically significant, indicating 
that the rate of acquaintanceship decreases when institution i is located in a 
different state than the University of Bielefeld. However, more important than 
whether i is located in the same state as the University of Bielefeld is whether 
i is in the same institutional category. The parameter is estimated to be 
-0.7, and reveals an interesting tendency: students at the University of Biele- 
feld, which is a general university, tend to have a higher rate of acquaintanceship 
with students from other general universities than with students attending other 
types of institutions. 

The parameter estimates of the extended model, displayed in the third col- 
umn, include more detailed information about how region and institution type 
affect the rate of acquaintanceship. While the estimate for the same state re- 
mains unchanged at -0.3, our estimate for the parameter, also -0.3, indi- 
cates that those i located in former East Germany tend to have a lower rates of 
acquaintanceship than those i located in former West Germany. Furthermore, 
the more specific treatment of institution type in the extended model reveals 
that UniBi has especially low acquaintanceship rates with religious and art- 
focused institutions, while the negative effects associated with applied technical 
colleges, private institutions, and teacher colleges are less pronounced. 

In the extended model version, the parameter estimate of log travel time f}V-> 
slightly decreases to a value of —1.43, indicating that the estimated parameter 
is quite robust. The slight decrease might suggest that travel time is to some 
extent a proxy for other unobserved characteristics in the basic model that are 
better reflected by additional independent variables in the extended model. 



5. Conclusion 

In this paper, we have used spatial interaction models to investigate StudiVZ 
friendships aggregated at the level of institutions of higher education. We have 
focused identifying institutional attributes that are related to the strength of 
ties between institutions. Our results indicate that geographic distance is the 
most important separation variable: for each additional 100 minutes of drive 
time separating UniBi from some other institution, the mean acquaintanceship 
rate with that institution drops by approximately 90%. StudiVZ is thus not 
a virtual world in which physical distance is insignificant. This finding is in 
agreeme nt with a previous s t udy o f LiveJournal, a blogging service with SNS 



features ( Liben-Nowell et al.l . 120051 ) 



Additionally, we found that students at UniBi are most likely to form rela- 
tionships with students at "peer institutions," i.e., students at other universities 
and technical universities, and that they are less likely to form acquaintance- 
ships with students at teacher colleges, applied technical colleges, art schools, 
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and religious institutions. Acquaintanceship is also (albeit less prominently) af- 
fected by regional characteristics: students at UniBi are more often acquainted 
with students in their own federal state (North Rhine- Westphalia) , and to the 
same extent they more frequently are friends with students from former West 
Germany than students from former East Germany. 

One important line of future work is to confirm these results with network 
data that is more complete. Our results are based on only those edges that 
have one end connected to a student at UniBi, and they are therefore skewed to 
reflect the preferences of students at that institution. With complete network 
data one could test whether these preferences closely resemble those of German 
students in general. 

More broadly, the spatial interaction modeling approach taken in this pa- 
per could be applied to other networks and with other separation measures. 
Depending on the data available, models of greater sophistication — either con- 
ceptually or methodologically — may become relevant. For example, spatial in- 
teraction model s may include production constraints or attraction constraints 
( Sen and SmTthl . Il995h . and spati al autocorrelation may be taken into account 
using spatial filtering techniques (jFischer and Griffith! . 120081) . 
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